Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
It is challenging to deploy 3D Convolutional Neural Networks (3D CNNs) on mobile devices, specifically if both real-time execution and high inference accuracy are in demand, because the increasingly large model size and complex model structure of 3D CNNs usually require tremendous computation and memory resources. Weight pruning is proposed to mitigate this challenge. However, existing pruning is either not compatible with modern parallel architectures, resulting in long inference latency or subject to significant accuracy degradation. This paper proposes an end-to-end 3D CNN acceleration framework based on pruning/compilation co-design called Mobile-3DCNN that consists of two parts: a novel, fine-grained structured pruning enhanced by a prune/Winograd adaptive selection (that is mobile-hardware-friendly and can achieve high pruning accuracy), and a set of compiler optimization and code generation techniques enabled by our pruning (to fully transform the pruning benefit to real performance gains). The evaluation demonstrates that Mobile-3DCNN outperforms state-of-the-art end-to-end DNN acceleration frameworks that support 3D CNN execution on mobile devices, Alibaba Mobile Neural Networks and Pytorch-Mobile with speedup up to 34 × with minor accuracy degradation, proving it is possible to execute high-accuracy large 3D CNNs on mobile devices in real-time (or even ultra-real-time).more » « lessFree, publicly-accessible full text available July 22, 2026
-
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT. Our code is released at https://github.com/ZLKong/Tri-Level-ViTmore » « less
An official website of the United States government

Full Text Available